Refactor the baseline Validation Pipeline to align with GSoC Architecture by ayushman1210 · Pull Request #4017 · PecanProject/pecan

ayushman1210 · 2026-05-31T10:22:51Z

This issue tracks the follow-up changes required for the dataframe-first validation framework introduced in PR #3892 aligning it with the architecture outlined in my GSoC 2026 workplan

Planned Changes

Modular 4-Stage Pipeline: Refactor the monolithic run_benchmark() entry point into a flexible 4-stage pipeline: Validate -> Align -> Compute -> Plot.
Base R / dplyr Integration: Replace the existing naive alignment logic with robust base R operations (exp findInterval) and dplyr for spatial and temporal alignment, ensuring high performance while removing external dependencies
BETYdb Decoupling: Ensure that the pipeline is completely independent of BETYdb IDs, relying solely on standard variable names and tabular data structures.
Generalized Metrics: Expand the metric computation (RMSE, MAE, R2, correlation) to ensure they operate cleanly on the aligned data frames, independent of legacy database structures.

Context

PR #3892 established the baseline dataframe-first concept. The changes tracked in this issue will evolve it into the generalized, high-performance toolkit planned for Phase 2 of my GSoC project.

…est data

…rics/plot_time_series, update tests

…_time_series

infotroph · 2026-06-05T12:42:14Z

data.table Integration: Replace the existing naive alignment logic with high-performance data.table operations

Please convert this to a dplyr or base R approach. data.table is a good package that lots of people use happily, but it is not widely used in PEcAn and its syntax is confusing to folks who don't know it already.

divine7022

overall it's in good shape, I dropped inline for thing I'd want a look at before merge

divine7022 · 2026-06-11T13:03:28Z

+      numer <- sum((dat$obs - mean(dat$obs, na.rm=T)) * (dat$model - mean(dat$model, na.rm=T)), na.rm=T)
+      denom <- sqrt(sum((dat$obs - mean(dat$obs, na.rm=T))^2, na.rm=T)) * sqrt(sum((dat$model - mean(dat$model, na.rm=T))^2, na.rm=T))
+      (numer / denom)^2
+    }


aligned dataframe returns c("time", "model", "obs") but the convention in this module is c("model", "obvs", "time") and read by every metric. any reason align_by_time doesn't follow it ?

divine7022 · 2026-06-11T13:24:12Z

+res <- run_benchmark(
+  model_path = "inst/testdata/sample_model.csv",
+  obs_path   = "inst/testdata/sample_obs.csv"
+)


ur using model_path = ..., obs_path = ... but signature is run_benchmark(model_df, obs_df, ...) looks like README is from an earlier API where function took paths; folks using this ex would hit an error. probably want to update this and the parameter list ryt below it and pass as df

divine7022 · 2026-06-11T13:45:15Z

+#' Load and standardize arbitrary tabular data using a YAML mapping configuration
+#'
+#' @param data.path character, file path to the tabular data (e.g. .csv)
+#' @param mapping.path character, file path to the YAML mapping configuration
+#' @return A standardized data frame with column names mapped to PEcAn standard vocabulary
+#' @export


i couldn't see load_and_map_data in NAMESPACE either a man page, looks like documen() hasn't re run

wondering about registry format as we were discussing abt frictionless data / data package.json, was the simpler yaml approach an intentional phase 2 simplification with frictionless coming later or did direction shift ?
either's fine, just want to make sure we're on same page, I'd defer to @dlebauer probably has perspective here

divine7022 · 2026-06-11T13:51:55Z

+bm_validate <- function(model_df, obs_df) {
+  for (df in list(model_df, obs_df)) {
+    if (!inherits(df$time, "POSIXct"))
+      stop("Column 'time' must be POSIXct, got: ", class(df$time))
+    if (!is.numeric(df$value))
+      stop("Column 'value' must be numeric, got: ", class(df$value))
+  }
+  invisible(TRUE)
+}


folks who passes df with column named timestamp or date instead of time hits df$time returning NULL, so error reads reads which is technically correct but doesn't tell user actual problem is missing column
worth checking column existence before type!

divine7022 · 2026-06-11T13:53:19Z

+      stop("Column 'time' must be POSIXct, got: ", class(df$time))
+    if (!is.numeric(df$value))
+      stop("Column 'value' must be numeric, got: ", class(df$value))


please use PEcAn.logger consistently, so messages land in the workflow log consistently

divine7022 · 2026-06-11T14:20:55Z

+compute_metrics <- function(aligned, metrics = c("RMSE", "MAE", "R2")) {
+  # Future-proofing: Functions in the registry now accept the full aligned dataframe
+  # This aligns with the decoupled metric architecture introduced in PR #3888
+  METRIC_REGISTRY <- list(
+    RMSE = function(dat) sqrt(mean((dat$model - dat$obs)^2, na.rm = TRUE)),
+    MAE  = function(dat) mean(abs(dat$model - dat$obs), na.rm = TRUE),
+    R2   = function(dat) {


registry is in the ryt shape. but one thing, it's closed inside the fun body, so adding new metric means editing this function rather than registering one externally. worth adding to a package level object that callers can extend

bigger question, what's the plan for prediction interval coverage and PMU ? together they're CAR SEP regulatory pass criterion: bias <= PMU AND >=90% coverage. neither metric is in PEcAn.benchmark today, and neither's in this PR. they need different inputs than current dat$model / dat$obs (PMU needs per treatment pair SE and replicate counts; coverage needs per prediction quantiles)
worth thinking about whether the input should grow to support them now rather than reworking later

NSE / MEF is a third worth adding, not regulatory required, but standard in the broader literature for cross paper comparability.
cheap if you're already adding metrics

divine7022 · 2026-06-11T14:25:06Z

+  time  = as.POSIXct(seq(0, 3600*3, by = 3600), origin = "1970-01-01", tz = "UTC"),
+  value = c(1.1, 1.9, 3.2, 3.9)
+)
+


couple of gaps here. none of the tests exercise R2 path, so bug I flagged regarding that ships green.
so worth tightening at least the R2 path, the alignment value assertion, and tolerance dropping case before this lands

divine7022 · 2026-06-11T14:40:00Z

+    obs = obs_df$value[nearest_idx][valid]
+  )
+
+  return(aligned)


tolerance filter is silent here. model and obs typically run at different timesteps in validation work, so default tolerance can drop most model points without user knowing.
add one line info log of kept vs dropped would make this visible

divine7022 · 2026-06-11T14:42:35Z

+plot_time_series <- function(aligned) {
+  ggplot2::ggplot(aligned, ggplot2::aes(x = .data$time)) +
+    ggplot2::geom_line(ggplot2::aes(y = .data$model, color = "Model")) +
+    ggplot2::geom_line(ggplot2::aes(y = .data$obs, color = "Obs")) +
+    ggplot2::labs(color = "", y = "value", title = "Model vs Observations") +
+    ggplot2::theme_bw()


observations are typically sparse relative to model output in validation workflows. plotting both as geom_line connects obs points with segments that imply information they don't have. convention is points for obs, line for model

anshul23102 and others added 10 commits May 28, 2026 22:37

feat(benchmark): add run_benchmark MVP with alignment, metrics, and t…

7df98d9

…est data

feat(benchmark): add tests and update README with quickstart

b2731cd

docs(benchmark): add roxygen man page for run_benchmark

6769cf9

docs(benchmark): add man pages and update NAMESPACE

f2aa206

fix(benchmark): fix NAMESPACE export and run_benchmark.Rd usage format

d53d372

fix(benchmark): use multi-line usage format and fix author name

d56416b

refactor(benchmark): dataframe-first API, add bm_validate/compute_met…

4df7daf

…rics/plot_time_series, update tests

docs(benchmark): add man pages for bm_validate, compute_metrics, plot…

bac2138

…_time_series

fix(benchmark): use .data$ in aes() to fix R CMD check NOTE

1a77447

Phase 2: Refactor benchmark pipeline and add data intake API

0887a47

github-actions Bot added tests modules labels May 31, 2026

Merge branch 'develop' into gsoc/phase2-architecture

403a564

ayushman1210 added the GSOC label Jun 1, 2026

divine7022 self-requested a review June 1, 2026 17:42

ayushman1210 added 3 commits June 2, 2026 00:09

Merge branch 'develop' into gsoc/phase2-architecture

954b123

Merge branch 'develop' into gsoc/phase2-architecture

a96665d

Merge branch 'develop' into gsoc/phase2-architecture

4b6baef

Merge branch 'develop' into gsoc/phase2-architecture

760c455

divine7022 added this to GSOC Benchmarking and Validation Jun 10, 2026

github-project-automation Bot moved this to Todo in GSOC Benchmarking and Validation Jun 10, 2026

divine7022 moved this from Todo to Review in GSOC Benchmarking and Validation Jun 10, 2026

used baseR findInterval() funct

e24195b

divine7022 requested changes Jun 11, 2026

View reviewed changes

divine7022 mentioned this pull request Jun 11, 2026

feat(benchmark): add run_benchmark() with dataframe-first pipeline, metrics, and tests #3892

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the baseline Validation Pipeline to align with GSoC Architecture#4017

Refactor the baseline Validation Pipeline to align with GSoC Architecture#4017
ayushman1210 wants to merge 16 commits into
PecanProject:developfrom
ayushman1210:gsoc/phase2-architecture

ayushman1210 commented May 31, 2026 •

edited

Loading

Uh oh!

infotroph commented Jun 5, 2026

Uh oh!

divine7022 left a comment

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ayushman1210 commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Planned Changes

Context

Uh oh!

infotroph commented Jun 5, 2026

Uh oh!

divine7022 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ayushman1210 commented May 31, 2026 •

edited

Loading